Semantic text - CCS POC - DO NOT MERGE #132411

Mikep86 · 2025-08-04T17:11:57Z

This is a POC implementation of CCS support for the semantic query when ccs_minimize_roundtrips=true.

It implements:

semantic query multi-index handling (adapted from Support using the semantic query across multiple inference IDs #120755)
semantic query CCS support when ccs_minimize_roundtrips=true
Detection for when ccs_minimize_roundtrips=false
Integration tests demonstrating the high-level functionality
A way to reuse local embeddings on remote clusters when compatible

Mikep86 · 2025-08-06T17:45:06Z

.../src/internalClusterTest/java/org/elasticsearch/search/ccs/SemanticCrossClusterSearchIT.java

+        SemanticQueryBuilder queryBuilder = new SemanticQueryBuilder(INFERENCE_FIELD, "foo");
+        queryBuilder.setModelRegistrySupplier(() -> modelRegistry);


This admittedly a hacky way to pass the model registry to the semantic query, but I was looking for a way to do it that didn't involve a lot of refactoring. The proper way to do this is likely through the constructor.

Mikep86 · 2025-08-06T17:49:43Z

.../inference/src/main/java/org/elasticsearch/xpack/inference/queries/SemanticQueryBuilder.java


-        String inferenceId = getInferenceIdForForField(resolvedIndices.getConcreteLocalIndicesMetadata().values(), fieldName);
-        SetOnce<InferenceServiceResults> inferenceResultsSupplier = new SetOnce<>();
-        boolean noInferenceResults = false;
-        if (inferenceId != null) {
-            InferenceAction.Request inferenceRequest = new InferenceAction.Request(
-                TaskType.ANY,
-                inferenceId,
-                null,
-                null,
-                null,
-                List.of(query),
-                Map.of(),
-                InputType.INTERNAL_SEARCH,
-                null,
-                false
-            );
-
-            queryRewriteContext.registerAsyncAction(
-                (client, listener) -> executeAsyncWithOrigin(
-                    client,
-                    ML_ORIGIN,
-                    InferenceAction.INSTANCE,
-                    inferenceRequest,
-                    listener.delegateFailureAndWrap((l, inferenceResponse) -> {
-                        inferenceResultsSupplier.set(inferenceResponse.getResults());
-                        l.onResponse(null);
-                    })
-                )
-            );
+        MapEmbeddingsProvider currentEmbeddingsProvider;
+        if (embeddingsProvider != null) {
+            if (embeddingsProvider instanceof MapEmbeddingsProvider mapEmbeddingsProvider) {
+                currentEmbeddingsProvider = mapEmbeddingsProvider;
+            } else {
+                throw new IllegalStateException("Current embeddings provider should be a MapEmbeddingsProvider");
+            }
        } else {
-            // The inference ID can be null if either the field name or index name(s) are invalid (or both).
-            // If this happens, we set the "no inference results" flag to true so the rewrite process can continue.
-            // Invalid index names will be handled in the transport layer, when the query is sent to the shard.
-            // Invalid field names will be handled when the query is re-written on the shard, where we have access to the index mappings.
-            noInferenceResults = true;
+            currentEmbeddingsProvider = new MapEmbeddingsProvider();
        }

-        return new SemanticQueryBuilder(this, noInferenceResults ? null : inferenceResultsSupplier, null, noInferenceResults);
+        boolean modified = false;
+        if (queryRewriteContext.hasAsyncActions() == false) {
+            ModelRegistry modelRegistry = modelRegistrySupplier.get();
+            if (modelRegistry == null) {
+                throw new IllegalStateException("Model registry has not been set");
+            }
+
+            Set<String> inferenceIds = getInferenceIdsForForField(resolvedIndices.getConcreteLocalIndicesMetadata().values(), fieldName);
+            for (String inferenceId : inferenceIds) {
+                MinimalServiceSettings serviceSettings = modelRegistry.getMinimalServiceSettings(inferenceId);
+                InferenceEndpointKey inferenceEndpointKey = new InferenceEndpointKey(inferenceId, serviceSettings);
+
+                if (currentEmbeddingsProvider.getEmbeddings(inferenceEndpointKey) == null) {
+                    InferenceAction.Request inferenceRequest = new InferenceAction.Request(
+                        TaskType.ANY,
+                        inferenceId,
+                        null,
+                        null,
+                        null,
+                        List.of(query),
+                        Map.of(),
+                        InputType.INTERNAL_SEARCH,
+                        null,
+                        false
+                    );
+
+                    queryRewriteContext.registerAsyncAction(
+                        (client, listener) -> executeAsyncWithOrigin(
+                            client,
+                            ML_ORIGIN,
+                            InferenceAction.INSTANCE,
+                            inferenceRequest,
+                            listener.delegateFailureAndWrap((l, inferenceResponse) -> {
+                                currentEmbeddingsProvider.addEmbeddings(
+                                    inferenceEndpointKey,
+                                    validateAndConvertInferenceResults(inferenceResponse.getResults(), fieldName, inferenceId)
+                                );
+                                l.onResponse(null);
+                            })
+                        )
+                    );
+
+                    modified = true;
+                }
+            }
+        }
+
+        return modified ? new SemanticQueryBuilder(this, currentEmbeddingsProvider, false) : this;
    }


This logic demonstrates a way to reuse embeddings cross-cluster, when they are compatible. For the sake of this POC I chose to use the combination of inference ID + minimal service settings to qualify inference endpoints as equal.

I wonder if we could make this even more simple. For example, apply a warning if just the inference ids are different. It is just a warning after all, that there is some potential detected difference here. This may not to be perfect as far as calculating the model registry to detect different models. We could also consider a flag for lenient mode to suppress warnings if people intentionally want to use different inference IDs.

Understood how there could be a gap here in detecting compatible embeddings. If we want to be more conservative here, we could use something like cluster name + inference ID to identify embeddings in the map. That would mean no embedding reuse cross-cluster though.

Setting a warning doesn't work for CCS though as warning headers are not transmitted back to the primary cluster.

kderusso

Nice POC! I've left some high level comments on functionality as this is still a POC.

kderusso · 2025-08-07T11:57:03Z

server/src/main/java/org/elasticsearch/action/search/TransportSearchShardsAction.java

        Rewriteable.rewriteAndFetch(
            original,
-            searchService.getRewriteContext(timeProvider::absoluteStartMillis, resolvedIndices, null),
+            searchService.getRewriteContext(timeProvider::absoluteStartMillis, resolvedIndices, null, original.isCcsMinimizeRoundtrips()),


Is there precedent for having CCS-specific knobs like this in generic search code?

Not sure, but there's a good case to be made why this is necessary. The CCS mode affects the query rewrite cycle, thus we need a way to know about it within that context. Info is passed to query rewrite via QueryRewriteContext, thus this implementation.

kderusso · 2025-08-07T12:00:24Z

.../inference/src/main/java/org/elasticsearch/xpack/inference/queries/InferenceEndpointKey.java

+import java.io.IOException;
+import java.util.Objects;
+
+public class InferenceEndpointKey implements Writeable {


I like this conceptually, but nitpicky - I would like to find a better name for it.

Sure, I wasn't focusing too much on names, just proving out functionality

kderusso · 2025-08-07T12:02:52Z

.../inference/src/main/java/org/elasticsearch/xpack/inference/queries/SemanticQueryBuilder.java


+            ModelRegistry modelRegistry = modelRegistrySupplier.get();
+            if (modelRegistry == null) {
+                throw new IllegalStateException("Model registry has not been set");


We may need to test this, to make sure it actually should always return a 500/trigger a serverless alert, similar to some other alerts we've been seeing for semantic queries.

This is one of those "should never happen in production" errors. If it does, it's a symptom of an upstream problem that we should be alerted to.

kderusso · 2025-08-07T12:04:13Z

.../inference/src/main/java/org/elasticsearch/xpack/inference/queries/SemanticQueryBuilder.java

-            );
+        MapEmbeddingsProvider currentEmbeddingsProvider;
+        if (embeddingsProvider != null) {
+            if (embeddingsProvider instanceof MapEmbeddingsProvider mapEmbeddingsProvider) {


Not sure if this check should be necessary

This is another one of those "this should never fail in production" cases. If we get here, it means we're performing coordinator node rewrite. That means that we're performing inference, either on a local or remote cluster.

If we're on a local cluster, we can assume that this node built the initial query and thus the embeddings provider should be a MapEmbeddingsProvider.

If we're on a remote cluster, we can assume that the the primary (i.e. local) cluster allows semantic queries to perform CCS, which is directly correlated to usage of MapEmbeddingsProvider.

Either way, we need the representation to be MapEmbeddingsProvider so that we can call addEmbeddings later, hence this check.

kderusso · 2025-08-07T12:06:20Z

.../inference/src/main/java/org/elasticsearch/xpack/inference/queries/SemanticQueryBuilder.java


-        String inferenceId = getInferenceIdForForField(resolvedIndices.getConcreteLocalIndicesMetadata().values(), fieldName);
-        SetOnce<InferenceServiceResults> inferenceResultsSupplier = new SetOnce<>();
-        boolean noInferenceResults = false;
-        if (inferenceId != null) {
-            InferenceAction.Request inferenceRequest = new InferenceAction.Request(
-                TaskType.ANY,
-                inferenceId,
-                null,
-                null,
-                null,
-                List.of(query),
-                Map.of(),
-                InputType.INTERNAL_SEARCH,
-                null,
-                false
-            );
-
-            queryRewriteContext.registerAsyncAction(
-                (client, listener) -> executeAsyncWithOrigin(
-                    client,
-                    ML_ORIGIN,
-                    InferenceAction.INSTANCE,
-                    inferenceRequest,
-                    listener.delegateFailureAndWrap((l, inferenceResponse) -> {
-                        inferenceResultsSupplier.set(inferenceResponse.getResults());
-                        l.onResponse(null);
-                    })
-                )
-            );
+        MapEmbeddingsProvider currentEmbeddingsProvider;
+        if (embeddingsProvider != null) {
+            if (embeddingsProvider instanceof MapEmbeddingsProvider mapEmbeddingsProvider) {
+                currentEmbeddingsProvider = mapEmbeddingsProvider;
+            } else {
+                throw new IllegalStateException("Current embeddings provider should be a MapEmbeddingsProvider");
+            }
        } else {
-            // The inference ID can be null if either the field name or index name(s) are invalid (or both).
-            // If this happens, we set the "no inference results" flag to true so the rewrite process can continue.
-            // Invalid index names will be handled in the transport layer, when the query is sent to the shard.
-            // Invalid field names will be handled when the query is re-written on the shard, where we have access to the index mappings.
-            noInferenceResults = true;
+            currentEmbeddingsProvider = new MapEmbeddingsProvider();
        }

-        return new SemanticQueryBuilder(this, noInferenceResults ? null : inferenceResultsSupplier, null, noInferenceResults);
+        boolean modified = false;
+        if (queryRewriteContext.hasAsyncActions() == false) {
+            ModelRegistry modelRegistry = modelRegistrySupplier.get();
+            if (modelRegistry == null) {
+                throw new IllegalStateException("Model registry has not been set");
+            }
+
+            Set<String> inferenceIds = getInferenceIdsForForField(resolvedIndices.getConcreteLocalIndicesMetadata().values(), fieldName);
+            for (String inferenceId : inferenceIds) {
+                MinimalServiceSettings serviceSettings = modelRegistry.getMinimalServiceSettings(inferenceId);
+                InferenceEndpointKey inferenceEndpointKey = new InferenceEndpointKey(inferenceId, serviceSettings);
+
+                if (currentEmbeddingsProvider.getEmbeddings(inferenceEndpointKey) == null) {
+                    InferenceAction.Request inferenceRequest = new InferenceAction.Request(
+                        TaskType.ANY,
+                        inferenceId,
+                        null,
+                        null,
+                        null,
+                        List.of(query),
+                        Map.of(),
+                        InputType.INTERNAL_SEARCH,
+                        null,
+                        false
+                    );
+
+                    queryRewriteContext.registerAsyncAction(
+                        (client, listener) -> executeAsyncWithOrigin(
+                            client,
+                            ML_ORIGIN,
+                            InferenceAction.INSTANCE,
+                            inferenceRequest,
+                            listener.delegateFailureAndWrap((l, inferenceResponse) -> {
+                                currentEmbeddingsProvider.addEmbeddings(
+                                    inferenceEndpointKey,
+                                    validateAndConvertInferenceResults(inferenceResponse.getResults(), fieldName, inferenceId)
+                                );
+                                l.onResponse(null);
+                            })
+                        )
+                    );
+
+                    modified = true;
+                }
+            }
+        }
+
+        return modified ? new SemanticQueryBuilder(this, currentEmbeddingsProvider, false) : this;
    }


I wonder if we could make this even more simple. For example, apply a warning if just the inference ids are different. It is just a warning after all, that there is some potential detected difference here. This may not to be perfect as far as calculating the model registry to detect different models. We could also consider a flag for lenient mode to suppress warnings if people intentionally want to use different inference IDs.

kderusso · 2025-08-07T12:07:24Z

.../inference/src/main/java/org/elasticsearch/xpack/inference/queries/SemanticQueryBuilder.java

+                )
+            );
        } else if (inferenceResultsList.size() > 1) {
            // The inference call should truncate if the query is too large.


Not sure if that's the case for all models? For example we warn in our docs that OpenAI will error if BYO chunks are too large.

Remember that we are handling query-time inference here, which will never chunk. We should always get back one inference result. If we get back more, something has gone horribly wrong in the Inference API that we want to know about, hence this check.

This comment may not be fully technically correct in that some providers may error instead of truncate on huge input. However, in that case, we will still get back only one inference result, it will just be an instance of ErrorInferenceResults.

Mikep86 · 2025-08-25T14:27:15Z

Superceded by #133466

Mikep86 added 30 commits July 18, 2025 09:15

Added test and instrumentation

482aad8

Store original query in InterceptedQueryBuilderWrapper

43174f1

Spotless

a922ed5

Added SemanticCrossClusterSearchIT

d9f37b7

test development

b94b41c

Allow cross-cluster search

ed58e78

Fix test

bc1ef89

Send pre-intercepted request to remote cluster

9a7d6f4

Added match query test

4740f0c

Added stub classes for match query builder wrapper

83923c4

Merge branch 'main' into semantic-text_ccs-discovery

a132f5d

Fix build error

f00a520

Fixed entitlement policy

87a1eaf

Add PIT integration test

38af314

Added CCS minimize round-trips to query rewrite context

4935515

Code cleanup

9c251dc

Spotless

1e5cb6f

Add model registry to semantic query builder

8db87ab

Remove unused code

0b8a1a9

Added the map embeddings provider

c55628c

Added the single embeddings provider

65b20bd

Update semantic query builder to use embeddings providers

a335f34

Added TODOs

071f6c5

Make ccsMinimizeRoundtrips nullable

0b945d4

Fix build errors

33d1be1

Fix test errors

42d5eab

Check that model registry is set

43dda15

Allow CCS only when ccs_minimize_roundtrips=true

cd878bd

Perform inference on remote cluster when necessary

d3937c1

Update integration test to use different inference IDs across clusters

659a79f

Set model registry for each semantic query builder instance

b407a4f

elasticsearchmachine added the v9.2.0 label Aug 4, 2025

Mikep86 added 5 commits August 4, 2025 13:13

Revert CrossClusterSearchIT changes

10ec96d

Revert InterceptedQueryBuilderWrapper changes

7e2d973

Revert MatchQueryBuilder changes

c39adbe

Adjust PIT test

9c58c11

Remove match query test

4fb6f42

Mikep86 commented Aug 6, 2025

View reviewed changes

Mikep86 requested review from jimczi and kderusso August 6, 2025 18:33

kderusso reviewed Aug 7, 2025

View reviewed changes

Merge branch 'main' into semantic-text_ccs-discovery

39b56cf

Mikep86 closed this Aug 25, 2025

		SemanticQueryBuilder queryBuilder = new SemanticQueryBuilder(INFERENCE_FIELD, "foo");
		queryBuilder.setModelRegistrySupplier(() -> modelRegistry);

Semantic text - CCS POC - DO NOT MERGE #132411

Semantic text - CCS POC - DO NOT MERGE #132411

Uh oh!

Conversation

Mikep86 commented Aug 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kderusso left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Mikep86 commented Aug 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Mikep86 commented Aug 4, 2025 •

edited

Loading